62 research outputs found

    Automated Classification of Vowel Category and Speaker Type in the High-Frequency Spectrum

    Get PDF
    The high-frequency region of vowel signals (above the third formant or F3) has received little research attention. Recent evidence, however, has documented the perceptual utility of high-frequency information in the speech signal above the traditional frequency bandwidth known to contain important cues for speech and speaker recognition. The purpose of this study was to determine if high-pass filtered vowels could be separated by vowel category and speaker type in a supervised learning framework. Mel frequency cepstral coefficients (MFCCs) were extracted from productions of six vowel categories produced by two male, two female, and two child speakers. Results revealed that the filtered vowels were well separated by vowel category and speaker type using MFCCs from the high-frequency spectrum. This demonstrates the presence of useful information for automated classification from the high-frequency region and is the first study to report findings of this nature in a supervised learning framework

    Self-supervised Interest Point Detection and Description for Fisheye and Perspective Images

    Full text link
    Keypoint detection and matching is a fundamental task in many computer vision problems, from shape reconstruction, to structure from motion, to AR/VR applications and robotics. It is a well-studied problem with remarkable successes such as SIFT, and more recent deep learning approaches. While great robustness is exhibited by these techniques with respect to noise, illumination variation, and rigid motion transformations, less attention has been placed on image distortion sensitivity. In this work, we focus on the case when this is caused by the geometry of the cameras used for image acquisition, and consider the keypoint detection and matching problem between the hybrid scenario of a fisheye and a projective image. We build on a state-of-the-art approach and derive a self-supervised procedure that enables training an interest point detector and descriptor network. We also collected two new datasets for additional training and testing in this unexplored scenario, and we demonstrate that current approaches are suboptimal because they are designed to work in traditional projective conditions, while the proposed approach turns out to be the most effective.Comment: CVPR Workshop on Omnidirectional Computer Vision, 202

    Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

    Full text link
    Precise 3D environmental mapping is pivotal in robotics. Existing methods often rely on predefined concepts during training or are time-intensive when generating semantic maps. This paper presents Open-Fusion, a groundbreaking approach for real-time open-vocabulary 3D mapping and queryable scene representation using RGB-D data. Open-Fusion harnesses the power of a pre-trained vision-language foundation model (VLFM) for open-set semantic comprehension and employs the Truncated Signed Distance Function (TSDF) for swift 3D scene reconstruction. By leveraging the VLFM, we extract region-based embeddings and their associated confidence maps. These are then integrated with 3D knowledge from TSDF using an enhanced Hungarian-based feature-matching mechanism. Notably, Open-Fusion delivers outstanding annotation-free 3D segmentation for open-vocabulary without necessitating additional 3D training. Benchmark tests on the ScanNet dataset against leading zero-shot methods highlight Open-Fusion's superiority. Furthermore, it seamlessly combines the strengths of region-based VLFM and TSDF, facilitating real-time 3D scene comprehension that includes object concepts and open-world semantics. We encourage the readers to view the demos on our project page: https://uark-aicv.github.io/OpenFusio

    Artificial Intelligence

    No full text

    Machine Learning

    No full text

    Computational Photography

    No full text

    SPTP:Big Data Engineering

    No full text
    • …
    corecore